1 Client Bio & Recommendation

  • Kirsten Pilatti, CEO of Breast Cancer Network Australia (BCNA). The organisation’s goal is to provide the best care and support for Australians suffering from breast cancer.

  • Linkedin: https://www.linkedin.com/in/kirsten-pilatti-6139219/

  • Website: https://www.bcna.org.au/about-us/who-we-are/

  • Recommendation: Patients found to have estrogen or progesterone negative cancer should be recommended to seek radical treatments like chemotherapy or surgery to preserve life instead of continuing hormone therapy as they are more likely to have both types of hormone negativity and experience more invasive, aggressive cancer.


2 Evidence

2.1 Initial Data Analysis (IDA)

Summary:

summary(breast_cancer)
##       Age            Race           Marital.Status       T.Stage         
##  Min.   :30.00   Length:4024        Length:4024        Length:4024       
##  1st Qu.:47.00   Class :character   Class :character   Class :character  
##  Median :54.00   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :53.97                                                           
##  3rd Qu.:61.00                                                           
##  Max.   :69.00                                                           
##    N.Stage           X6th.Stage        differentiate         Grade          
##  Length:4024        Length:4024        Length:4024        Length:4024       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    A.Stage            Tumor.Size     Estrogen.Status    Progesterone.Status
##  Length:4024        Min.   :  1.00   Length:4024        Length:4024        
##  Class :character   1st Qu.: 16.00   Class :character   Class :character   
##  Mode  :character   Median : 25.00   Mode  :character   Mode  :character   
##                     Mean   : 30.47                                         
##                     3rd Qu.: 38.00                                         
##                     Max.   :140.00                                         
##  Regional.Node.Examined Reginol.Node.Positive Survival.Months
##  Min.   : 1.00          Min.   : 1.000        Min.   :  1.0  
##  1st Qu.: 9.00          1st Qu.: 1.000        1st Qu.: 56.0  
##  Median :14.00          Median : 2.000        Median : 73.0  
##  Mean   :14.36          Mean   : 4.158        Mean   : 71.3  
##  3rd Qu.:19.00          3rd Qu.: 5.000        3rd Qu.: 90.0  
##  Max.   :61.00          Max.   :46.000        Max.   :107.0  
##     Status         
##  Length:4024       
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
  • The data came from the 2017 update of the SEER program involving female patients suffering from breast cancer of a specific type.

  • More information can be found here:

data

source

variable


2.2 Hormone negativity and its relation to tumor size

There are studies showing that hormone negative cancer cells are harder to treat and more aggressive as those which are positive can be combat with hormone therapy (Double Negative Breast Cancer, n.d.). To investigate this relation, two sets of comparative box plots were drawn.

tumor_double_negative_plot <-ggplot(N_N, aes(x = factor(Estrogen.Status), y = Tumor.Size)) +
  geom_boxplot() +
  labs(
    title = "Comparative Boxplot of Tumor Size by Estrogen and Progesterone Status",
    x = "Negative",
    y = "Tumor Size"
  )

tumor_double_positive_plot <- ggplot(P_P, aes(x = factor(Progesterone.Status), y = Tumor.Size)) +
  geom_boxplot() +
  labs(
    title = "Comparative Boxplot of Tumor Size by Estrogen and Progesterone Status",
    x = "Positive",
    y = "Tumor Size"
  )

subplot(tumor_double_positive_plot, tumor_double_negative_plot, nrows = 1)

From these two comparative box plots, we can see that hormone negativity is associated with slightly larger tumor size.

Two sample T-test to check the relation between double negative and double positive cancer on tumor size at 5% confidence level.

data_NN <- N_N %>%
  select(Tumor.Size)
data_PP <- P_P %>%
  select(Tumor.Size)
t.test(data_NN, data_PP, var.equal = T)
## 
##  Two Sample t-test
## 
## data:  data_NN and data_PP
## t = 3.8923, df = 3539, p-value = 0.0001011
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.691723 8.155919
## sample estimates:
## mean of x mean of y 
##  35.17769  29.75386

The p_value 0.0001 < 0.05 means that the result is statistically significant and we conclude that there is evidence for the statement that patients with double negative cancer have larger tumors. This may warrant more radical treatments like surgery sooner to prevent having to resort to mastectomy (Surgery for Breast Cancer | Breast Cancer Treatment, n.d.).

2.3 Hormone negativity and its relation to survival months

If hormone negativity is associated with more aggressive tumors, they would lead to a worse prognosis in patients, as reported by some studies (Double Negative Breast Cancer, n.d.). Two sets of comparative box plots using deceased patients for true survival months were drawn to check this relation.

tumor_double_negative_plot <-ggplot(N_N_Dead, aes(x = factor(Estrogen.Status), y = Survival.Months)) +
  geom_boxplot() +
  labs(
    title = "Comparative Boxplot of Survival Months by Estrogen and Progesterone Status",
    x = "Negative",
    y = "Survival Months"
  )

tumor_double_positive_plot <- ggplot(P_P_Dead, aes(x = factor(Progesterone.Status), y = Survival.Months)) +
  geom_boxplot() +
  labs(
    title = "Comparative Boxplot of Survival Months by Estrogen and Progesterone Status",
    x = "Positive",
    y = "Survival Months"
  )

subplot(tumor_double_positive_plot, tumor_double_negative_plot, nrows = 1)

In these box plots, deceased patients are selected to ensure that their survival months are true survival months (time from diagnosis to death). Comparing these two box plots, it is clear that those who have hormone negative cancer of one of both types have a radically reduced survival months.

Two sample T-test to check the relation between double negative and double positive cancer on survival months at 5% confidence level.

data_NN_Survival <- N_N_Dead %>%
  select(Survival.Months)
data_PP_Survival <- P_P_Dead %>%
  select(Survival.Months)
t.test(data_NN, data_PP, var.equal = F)
## 
##  Welch Two Sample t-test
## 
## data:  data_NN and data_PP
## t = 3.3822, df = 267.14, p-value = 0.0008266
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.266476 8.581166
## sample estimates:
## mean of x mean of y 
##  35.17769  29.75386

The p_value 0.000784 < 0.05 means that the result is statistically significant and we conclude that there is evidence for the statement that patients with double negative cancer have a lower survival months. Once again, this may warrant radical treatments in order to preserve life (Surgery for Breast Cancer | Breast Cancer Treatment, n.d.).

2.4 Hormone negativity and its relation to itself

As both Estrogen and Progesterone are important chemicals influencing the female reproductive functions, there might be a dependency between them (University, n.d.). A mosaic plot is drawn to check this relation.

Hormone = matrix(c(nrow(N_N),nrow(P_N),nrow(N_P),nrow(P_P)), nrow = 2, ncol = 2, byrow = TRUE, dimnames = list(c("Estrogen Negative", "Estrogen Positive"), c("Progesterone Negative", "Progesterone Positive")))
mosaicplot(Hormone)

print(Hormone)
##                   Progesterone Negative Progesterone Positive
## Estrogen Negative                   242                   456
## Estrogen Positive                    27                  3299

We can see that there is disproportionately more in the double positive group, suggesting dependency.

Chi-square test to check relation between two types of hormone negativity at 5% confidence level

chisq.test(Hormone)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  Hormone
## X-squared = 1054.8, df = 1, p-value < 2.2e-16

As the p-value 2.2e-16 < 0.05, we can conclude that there is dependency. This would imply that having one type of hormone negativity leads to an increased chance of having the other, making hormone therapy not possible and so further the need for other treatment methods like surgery (Surgery for Breast Cancer | Breast Cancer Treatment, n.d.).

3 Acknowledgments


teng, jing. (2019). SEER Breast Cancer Data. Ieee-Dataport.org. https://ieee-dataport.org/open-access/seer-breast-cancer-data

Breast Cancer. (n.d.). Www.kaggle.com. https://www.kaggle.com/datasets/reihanenamdari/breast-cancer

Cancer.net. (2019, January 8). Stages of Cancer. Cancer.net. https://www.cancer.net/navigating-cancer-care/diagnosing-cancer/stages-cancer

Double Negative Breast Cancer. (n.d.). Vial. Retrieved November 1, 2023, from https://vial.com/glossary/double-negative-breast-cancer/?https://vial.com/glossary/double-negative-breast-cancer/?utm_source=organic

University, T. A. N. (n.d.). Oestrogen and Progesterone. Bluepages.anu.edu.au. https://bluepages.anu.edu.au/medical-treatments/oestrogen/#:~:text=Oestrogen%20(also%20called%20

4 Appendix

4.1 Client Choice

The client Kirsten Pilatti, representing BCNA, was chosen as this report would contribute to advising the cancer patients.

The choice to take or not to take radical treatments with severe possible side effects is a difficult decision that many cancer patients will have to face. There is no doubt that this is a question that BCNA would often face. With the idea of life over limbs at heart, this report content is centered around predicting the prognosis of cancer to justify advising radical treatments

4.2 Statisitcal Analyses

4.2.1 Hormone negativity and tumor size:

H - H0: Progesterone and Estrogen negativity makes no difference to a patient’s tumor size H1: Progesterone and Estrogen negativity increase a patient’s tumor size

A - Independence, normality, equal spread

Independence: We assumed independence due to the large sample making any possible dependency insignificant

Normality:

  • Eye test: The comparative box plots shows a significant number of outliers, possibly making this model invalid.

  • Shapiro-Wilk tests: Both p-values < 0.05, showing that the two data sets is not normal and so this model may be invalid

data_NN_numerical <- as.numeric(unlist(data_NN))
data_PP_numerical <- as.numeric(unlist(data_PP))

shapiro.test(data_NN_numerical)
## 
##  Shapiro-Wilk normality test
## 
## data:  data_NN_numerical
## W = 0.87077, p-value = 1.926e-13
shapiro.test(data_PP_numerical)
## 
##  Shapiro-Wilk normality test
## 
## data:  data_PP_numerical
## W = 0.83287, p-value < 2.2e-16

Equal spread:

  • Eye test: The comparative box plots shows roughly equal spread

  • Levene tests: As the p-value < 0.05, the two dataset can be considered having unequal spread, making some analysis possibly invalid.

var.test(data_NN_numerical, data_PP_numerical)
## 
##  F test to compare two variances
## 
## data:  data_NN_numerical and data_PP_numerical
## F = 1.3855, num df = 241, denom df = 3298, p-value = 0.0002636
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  1.159769 1.680107
## sample estimates:
## ratio of variances 
##           1.385463

4.2.2 Hormone negativity and survival months:

H - H0: Progesterone and Estrogen negativity makes no difference to a patient’s survival months. H1: Progesterone and Estrogen negativity reduce a patient’s survival months.

A - Independence, normality, equal spread.

Independence: We assumed independence due to the large sample making any possible dependency insignificant.

Normality:

  • Eye test: The comparative box plots shows small numbers of outliers.

  • Shapiro - Wilks tests: Both p-values < 0.05, showing that the two data sets is not normal and so this model may be invalid.

data_NN_Survival_numerical <- as.numeric(unlist(data_NN_Survival))
data_PP_Survival_numerical <- as.numeric(unlist(data_PP_Survival))

shapiro.test(data_NN_Survival_numerical)
## 
##  Shapiro-Wilk normality test
## 
## data:  data_NN_Survival_numerical
## W = 0.88049, p-value = 1.523e-07
shapiro.test(data_PP_Survival_numerical)
## 
##  Shapiro-Wilk normality test
## 
## data:  data_PP_Survival_numerical
## W = 0.98538, p-value = 0.0004081

Equal spread:

  • Eye test: The comparative box plots shows slightly unequal spread.

  • Levene tests: As the p-value < 0.05, analysis using the Welch’s T-test is more appropriate as the spread is unequal.

var.test(data_NN_Survival_numerical, data_PP_Survival_numerical)
## 
##  F test to compare two variances
## 
## data:  data_NN_Survival_numerical and data_PP_Survival_numerical
## F = 0.67178, num df = 101, denom df = 405, p-value = 0.0167
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.5001119 0.9290379
## sample estimates:
## ratio of variances 
##          0.6717814

4.2.3 Hormone negativity dependency on itself:

H - H0: Progesterone and Estrogen negativity is independent H1: Progesterone and Estrogen negativity is not independent

A - Cochran’s Rule (satisfied, all expected values are > 5)

4.3 Limitations

There is little data involving which treatment the participants have received. This will act as a confounding variable.

There is little data on the real survival months of participants, time from the onset of cancer to death might be a better indicator.

There are many outliers in the double positive group, making the correlation potentially invalid or weaker than expected.

The datasets are not normal, making some conclusions potential invalid, further research required.